feat: add stringAgg aggregate function by KyleAMathews · Pull Request #1382 · TanStack/db

KyleAMathews · 2026-03-17T23:07:02Z

Summary

Adds a stringAgg aggregate function that concatenates string values within groups, with configurable separators and ordering. Available across the full stack: query builder, compiler, and IVM engine.

Approach

Two-tier IVM architecture:

Stateful incremental path (used by the query compiler): Maintains a sorted array of entries per group with O(log n) binary search for insertion/removal. Fast-path string slicing for head/tail changes avoids full rebuilds — only middle-position mutations trigger a rebuild from the ordered entries.
Stateless fallback path (no rowKeyExtractor): Simple sort-and-join on each update. Used when row identity isn't available.

Query builder overloads disambiguate the flexible API surface:

stringAgg(value)                        // default order, no separator
stringAgg(value, orderBy)               // ordered, no separator  
stringAgg(value, separator)             // separator, default order
stringAgg(value, separator, orderBy)    // both

The compiler distinguishes separator (literal string) from orderBy (column reference) by expression type, and always provides a rowKeyExtractor to enable the incremental path.

Key invariants

entriesByKey and orderedEntries must stay synchronized — removeStringAggEntry now throws on desynchronization rather than silently returning
Fast-path text splicing is only used for first/last position changes; middle mutations set textDirty for a full rebuild
Null/undefined values are excluded from concatenation (consistent with SQL string_agg semantics)

Non-goals

Custom comparators for ordering (uses built-in comparison with Date normalization)
Streaming/lazy concatenation for very large groups

Verification

pnpm vitest run packages/db-ivm/tests/operators/groupBy.test.ts packages/db/tests/query/builder/functions.test.ts packages/db/tests/query/group-by.test.ts packages/db/tests/query/group-by.test-d.ts

All 140 tests pass, including new coverage for:

Incremental inserts/removes/updates with ordering
Group deletion + re-creation (cleanup path)
Fallback path (no rowKeyExtractor)
Values containing the separator string
Builder overload disambiguation
Type-level tests for return types

Files changed

File	Change
`packages/db-ivm/src/operators/groupBy.ts`	Core `stringAgg` implementation with binary search, incremental text maintenance, and fallback path
`packages/db-ivm/src/operators/reduce.ts`	Pass group key to reduction function
`packages/db/src/query/builder/functions.ts`	`stringAgg` builder with 4 overloads and `OrderByLike` type
`packages/db/src/query/compiler/group-by.ts`	Compiler case mapping builder IR to IVM `stringAgg`
`packages/db/src/query/index.ts`	Re-export `stringAgg`
`docs/guides/live-queries.md`	Usage examples and API reference
Test files (4)	Comprehensive coverage across IVM, builder, compiler, and type levels

🤖 Generated with Claude Code

Adds a new stringAgg aggregate function that concatenates string values within groups with configurable separator and ordering. Uses a two-tier architecture: a stateful incremental path with O(log n) binary search for efficient delta updates, and a stateless fallback for simpler use cases. Includes query builder overloads, compiler integration, docs, and comprehensive test coverage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

pkg-pr-new · 2026-03-17T23:10:20Z

More templates

@tanstack/angular-db

npm i https://pkg.pr.new/TanStack/db/@tanstack/angular-db@1382

@tanstack/db

npm i https://pkg.pr.new/TanStack/db/@tanstack/db@1382

@tanstack/db-browser-wa-sqlite-persisted-collection

npm i https://pkg.pr.new/TanStack/db/@tanstack/db-browser-wa-sqlite-persisted-collection@1382

@tanstack/db-ivm

npm i https://pkg.pr.new/TanStack/db/@tanstack/db-ivm@1382

@tanstack/db-react-native-sqlite-persisted-collection

npm i https://pkg.pr.new/TanStack/db/@tanstack/db-react-native-sqlite-persisted-collection@1382

@tanstack/db-sqlite-persisted-collection-core

npm i https://pkg.pr.new/TanStack/db/@tanstack/db-sqlite-persisted-collection-core@1382

@tanstack/electric-db-collection

npm i https://pkg.pr.new/TanStack/db/@tanstack/electric-db-collection@1382

@tanstack/offline-transactions

npm i https://pkg.pr.new/TanStack/db/@tanstack/offline-transactions@1382

@tanstack/powersync-db-collection

npm i https://pkg.pr.new/TanStack/db/@tanstack/powersync-db-collection@1382

@tanstack/query-db-collection

npm i https://pkg.pr.new/TanStack/db/@tanstack/query-db-collection@1382

@tanstack/react-db

npm i https://pkg.pr.new/TanStack/db/@tanstack/react-db@1382

@tanstack/rxdb-db-collection

npm i https://pkg.pr.new/TanStack/db/@tanstack/rxdb-db-collection@1382

@tanstack/solid-db

npm i https://pkg.pr.new/TanStack/db/@tanstack/solid-db@1382

@tanstack/svelte-db

npm i https://pkg.pr.new/TanStack/db/@tanstack/svelte-db@1382

@tanstack/trailbase-db-collection

npm i https://pkg.pr.new/TanStack/db/@tanstack/trailbase-db-collection@1382

@tanstack/vue-db

npm i https://pkg.pr.new/TanStack/db/@tanstack/vue-db@1382

commit: 4888335

github-actions · 2026-03-17T23:11:00Z

Size Change: +272 B (+0.25%)

Total Size: 110 kB

Filename	Size	Change
`./packages/db/dist/esm/index.js`	2.88 kB	+20 B (+0.7%)
`./packages/db/dist/esm/query/builder/functions.js`	872 B	+80 B (+10.1%)	⚠️
`./packages/db/dist/esm/query/compiler/group-by.js`	2.86 kB	+172 B (+6.39%)	🔍

ℹ️ View Unchanged

Filename	Size
`./packages/db/dist/esm/collection/change-events.js`	1.39 kB
`./packages/db/dist/esm/collection/changes.js`	1.38 kB
`./packages/db/dist/esm/collection/cleanup-queue.js`	810 B
`./packages/db/dist/esm/collection/events.js`	434 B
`./packages/db/dist/esm/collection/index.js`	3.69 kB
`./packages/db/dist/esm/collection/indexes.js`	2.35 kB
`./packages/db/dist/esm/collection/lifecycle.js`	1.76 kB
`./packages/db/dist/esm/collection/mutations.js`	2.47 kB
`./packages/db/dist/esm/collection/state.js`	5.2 kB
`./packages/db/dist/esm/collection/subscription.js`	3.71 kB
`./packages/db/dist/esm/collection/sync.js`	2.43 kB
`./packages/db/dist/esm/collection/transaction-metadata.js`	144 B
`./packages/db/dist/esm/deferred.js`	207 B
`./packages/db/dist/esm/errors.js`	4.83 kB
`./packages/db/dist/esm/event-emitter.js`	748 B
`./packages/db/dist/esm/indexes/auto-index.js`	777 B
`./packages/db/dist/esm/indexes/base-index.js`	766 B
`./packages/db/dist/esm/indexes/btree-index.js`	2.17 kB
`./packages/db/dist/esm/indexes/lazy-index.js`	1.24 kB
`./packages/db/dist/esm/indexes/reverse-index.js`	538 B
`./packages/db/dist/esm/local-only.js`	890 B
`./packages/db/dist/esm/local-storage.js`	2.1 kB
`./packages/db/dist/esm/optimistic-action.js`	359 B
`./packages/db/dist/esm/paced-mutations.js`	496 B
`./packages/db/dist/esm/proxy.js`	3.75 kB
`./packages/db/dist/esm/query/builder/index.js`	5.15 kB
`./packages/db/dist/esm/query/builder/ref-proxy.js`	1.05 kB
`./packages/db/dist/esm/query/compiler/evaluators.js`	1.62 kB
`./packages/db/dist/esm/query/compiler/expressions.js`	430 B
`./packages/db/dist/esm/query/compiler/index.js`	3.62 kB
`./packages/db/dist/esm/query/compiler/joins.js`	2.11 kB
`./packages/db/dist/esm/query/compiler/order-by.js`	1.5 kB
`./packages/db/dist/esm/query/compiler/select.js`	1.11 kB
`./packages/db/dist/esm/query/effect.js`	4.78 kB
`./packages/db/dist/esm/query/expression-helpers.js`	1.43 kB
`./packages/db/dist/esm/query/ir.js`	784 B
`./packages/db/dist/esm/query/live-query-collection.js`	360 B
`./packages/db/dist/esm/query/live/collection-config-builder.js`	7.63 kB
`./packages/db/dist/esm/query/live/collection-registry.js`	264 B
`./packages/db/dist/esm/query/live/collection-subscriber.js`	1.94 kB
`./packages/db/dist/esm/query/live/internal.js`	145 B
`./packages/db/dist/esm/query/live/utils.js`	1.57 kB
`./packages/db/dist/esm/query/optimizer.js`	2.62 kB
`./packages/db/dist/esm/query/predicate-utils.js`	2.97 kB
`./packages/db/dist/esm/query/query-once.js`	359 B
`./packages/db/dist/esm/query/subset-dedupe.js`	960 B
`./packages/db/dist/esm/scheduler.js`	1.3 kB
`./packages/db/dist/esm/SortedMap.js`	1.3 kB
`./packages/db/dist/esm/strategies/debounceStrategy.js`	247 B
`./packages/db/dist/esm/strategies/queueStrategy.js`	428 B
`./packages/db/dist/esm/strategies/throttleStrategy.js`	246 B
`./packages/db/dist/esm/transactions.js`	2.9 kB
`./packages/db/dist/esm/utils.js`	927 B
`./packages/db/dist/esm/utils/browser-polyfills.js`	304 B
`./packages/db/dist/esm/utils/btree.js`	5.61 kB
`./packages/db/dist/esm/utils/comparison.js`	1.05 kB
`./packages/db/dist/esm/utils/cursor.js`	457 B
`./packages/db/dist/esm/utils/index-optimization.js`	1.54 kB
`./packages/db/dist/esm/utils/type-guards.js`	157 B
`./packages/db/dist/esm/virtual-props.js`	360 B

_{compressed-size-action::db-package-size}

github-actions · 2026-03-17T23:11:41Z

Size Change: 0 B

Total Size: 4.23 kB

ℹ️ View Unchanged

Filename	Size
`./packages/react-db/dist/esm/index.js`	249 B
`./packages/react-db/dist/esm/useLiveInfiniteQuery.js`	1.32 kB
`./packages/react-db/dist/esm/useLiveQuery.js`	1.34 kB
`./packages/react-db/dist/esm/useLiveQueryEffect.js`	355 B
`./packages/react-db/dist/esm/useLiveSuspenseQuery.js`	559 B
`./packages/react-db/dist/esm/usePacedMutations.js`	401 B

_{compressed-size-action::react-db-package-size}

kevin-dp

PR #1382 Review: `stringAgg` aggregate function

Overall assessment

The implementation is sound for its intended use cases and the tests are thorough. But there are real architectural concerns, one potential correctness issue, and some unnecessary complexity worth discussing.

1. It's not truly incremental — the reduce function is O(n) every time

The reduce function rebuilds nextEntriesByKey by scanning all entries in the Index for the group on every invocation:

// groupBy.ts ~line 430-457
const nextEntriesByKey = new Map<...>()
for (const [entry, multiplicity] of values) {   // ← ALL accumulated values
    if (entry.rowKey == null || multiplicity <= 0 || entry.value == null) continue
    nextEntriesByKey.set(entry.rowKey, {...})
}

The "incremental" benefit is only in steps 2 and 3 — maintaining the sorted array and doing fast-path text splicing. Step 1 (building the target state) is always O(k) where k = entries in the Index for that group.

This is a limitation of the ReduceOperator contract (it always passes all accumulated values), so it's not something stringAgg can easily avoid. But the PR description's emphasis on "O(log n) binary search" and "fast-path string slicing" is somewhat misleading — those optimizations only help after the O(k) scan has already happened.

The good news: the Index does consolidate entries via content hashing (MurmurHash in packages/db-ivm/src/hashing/hash.ts). When a row is removed, the +1 and -1 entries for the same content cancel out and are deleted. So k ≈ number of active rows in the group, not historical total. This makes the O(k) scan reasonable in practice.

2. Correctness depends on a fragile invariant: "last positive-multiplicity entry wins"

This is my biggest concern. The code builds the target state by iterating Index entries and setting nextEntriesByKey[rowKey] for each positive-multiplicity entry, with last-write-wins:

for (const [entry, multiplicity] of values) {
    if (multiplicity <= 0 ...) continue
    nextEntriesByKey.set(entry.rowKey, {...})  // last positive wins
}

Compare this to how existing aggregates work:

sum: total += value * multiplicity — algebraically correct regardless of entry order, handles negative multiplicities naturally
count: totalCount += nullMultiplier * multiplicity — same: algebraic
stringAgg: skips negative multiplicities entirely — correct only if the Index properly consolidates

If the Index ever fails to consolidate (e.g., hash collision between two different pre-mapped objects, or a bug in consolidation), stringAgg would silently include ghost entries that should have been removed. The algebraic aggregates would still be correct because value * (+1) + value * (-1) = 0.

This is a real fragility difference. The hash function is 32-bit MurmurHash — collisions are rare but not impossible, and when they happen, stringAgg would be the first aggregate to produce visibly wrong results.

3. `Array.splice()` is O(n) — the binary search doesn't help as much as claimed

The sorted array maintenance uses splice for both insert and remove:

// Insert
state.orderedEntries.splice(index, 0, entry)  // O(n) array shift

// Remove  
state.orderedEntries.splice(index, 1)          // O(n) array shift

The O(log n) binary search finds the position, but the O(n) splice dominates. For a group with 10,000 entries, each insert/remove shifts thousands of array elements. A balanced BST or skip list would give true O(log n) insert/remove, but that's probably over-engineering for typical group sizes.

4. Fast-path text splicing: clever but has wasted work

The head/tail text splicing optimization is well-implemented — it correctly uses exact value lengths rather than searching for separator positions, so it handles values containing the separator string correctly (as tested).

However, when textDirty is already true (from a middle-position change), subsequent remove/insert operations still perform fast-path string modifications that will be thrown away:

// If an earlier change set textDirty = true, this string work is wasted
if (index === entryCount - 1) {
    state.text = state.text.slice(0, state.text.length - suffixLength)
    return false  // says "no rebuild needed" but textDirty is already true
}

Not a correctness issue — the rebuild at the end produces the right result. But it's unnecessary string allocation for batch updates that include a middle-position change.

5. The stateful closure pattern is novel and creates lifecycle coupling

This is the first aggregate to use closure-based external state (groupStates Map). All other aggregates (sum, count, avg, min, max, median, mode) are stateless — they compute the result from the full set of values each time.

The stateful pattern requires the new cleanup callback:

type BasicAggregateFunction<T, R, V = unknown, Reduced = V> = {
  preMap: (data: T) => V
  reduce: (values: Array<[V, number]>, groupKey: string) => Reduced
  postMap?: (result: Reduced) => R
  cleanup?: (groupKey: string) => void    // NEW: only needed by stringAgg
}

Plus the reduce function signature change to pass groupKey (also new). These are framework-level changes to support a single aggregate. The cleanup is correctly called when totalMultiplicity <= 0, which handles group deletion. But this creates a coupling between the aggregate's lifecycle and the reduce operator's — if the graph is ever recreated or the reduce operator is reset without calling cleanup, the stale groupStates entries would cause incorrect results.

6. Minor issues in the compiler integration

In packages/db/src/query/compiler/group-by.ts, the stringagg case disambiguates separator vs orderBy by checking expression type:

const separator =
    separatorOrOrderByExpr?.type === `val` &&
    typeof separatorOrOrderByExpr.value === `string`
        ? separatorOrOrderByExpr.value
        : ``

This means stringAgg(col, "") (empty string separator) would work, but stringAgg(col, someStringColumn) would incorrectly treat someStringColumn as an orderBy expression (since it's a ref, not a val). That's actually the correct behavior per the API design — but it means you can't use a column reference as a dynamic separator. Worth documenting.

7. The `compareStringAggOrderValues` comparison works but has edge cases

Comparing bigint and number via < / > works in JS. Comparing boolean gives false < true. These are fine. But comparing string vs number (e.g., if a user accidentally passes mixed types) gives unpredictable results since JS coerces strings to NaN. The type system should prevent this in practice, but there's no runtime guard.

Summary

Aspect	Assessment
Correctness	Sound under normal conditions; fragile vs hash collisions
Incremental benefit	Real but limited — O(k) scan is unavoidable, splice is O(n)
Best use case	Append-only streams (LLM chunks) where tail-insert fast-path shines
Code complexity	High for the benefit; the stateful pattern + cleanup adds framework-level changes
Test coverage	Excellent — covers inserts, removes, updates, reordering, group deletion/recreation, separator-in-value, fallback path

The main question I have: is the stateful incremental approach worth the complexity? For the LLM streaming use case (appending to the end), the fast-path tail insert avoids an O(total_text_length) rebuild per chunk, which is genuinely valuable. But for other use cases (random inserts/removals in the middle), it degrades to a full rebuild anyway, and the O(k) scan + O(n) splice overhead means it's not dramatically faster than a simple "sort and join" on each call.

KyleAMathews and others added 3 commits March 17, 2026 17:05

chore: add changeset for stringAgg feature

916d1a5

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ci: apply automated fixes

4888335

kevin-dp requested changes Mar 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add stringAgg aggregate function#1382

feat: add stringAgg aggregate function#1382
KyleAMathews wants to merge 3 commits intomainfrom
string-agg

KyleAMathews commented Mar 17, 2026

Uh oh!

pkg-pr-new bot commented Mar 17, 2026

Uh oh!

github-actions bot commented Mar 17, 2026

Uh oh!

github-actions bot commented Mar 17, 2026

Uh oh!

kevin-dp left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

KyleAMathews commented Mar 17, 2026

Summary

Approach

Key invariants

Non-goals

Verification

Files changed

Uh oh!

pkg-pr-new bot commented Mar 17, 2026

Uh oh!

github-actions bot commented Mar 17, 2026

Uh oh!

github-actions bot commented Mar 17, 2026

Uh oh!

kevin-dp left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

PR #1382 Review: stringAgg aggregate function

Overall assessment

1. It's not truly incremental — the reduce function is O(n) every time

2. Correctness depends on a fragile invariant: "last positive-multiplicity entry wins"

3. Array.splice() is O(n) — the binary search doesn't help as much as claimed

4. Fast-path text splicing: clever but has wasted work

5. The stateful closure pattern is novel and creates lifecycle coupling

6. Minor issues in the compiler integration

7. The compareStringAggOrderValues comparison works but has edge cases

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kevin-dp left a comment •

edited

Loading

PR #1382 Review: `stringAgg` aggregate function

3. `Array.splice()` is O(n) — the binary search doesn't help as much as claimed

7. The `compareStringAggOrderValues` comparison works but has edge cases